Comparative genomic analysis of Streptomyces rapamycinicus NRRL 5491 and its mutant overproducing rapamycin

Streptomyces rapamycinicus NRRL 5491 is a well-known producer of rapamycin, a secondary metabolite with useful bioactivities, including antifungal, antitumor, and immunosuppressive functions. For the enhanced rapamycin production, a rapamycin-overproducing strain SRMK07 was previously obtained as a result of random mutagenesis. To identify genomic changes that allowed the SRMK07 strain’s enhanced rapamycin production, genomes of the NRRL 5491 and SRMK07 strains were newly sequenced in this study. The resulting genome sequences of the wild-type and SRMK07 strains showed the size of 12.47 Mbp and 9.56 Mbp, respectively. Large deletions were observed at both end regions of the SRMK07 strain’s genome, which cover 17 biosynthetic gene clusters (BGCs) encoding secondary metabolites. Also, genes in a genomic region containing the rapamycin BGC were shown to be duplicated. Finally, comparative metabolic network analysis using these two strains’ genome-scale metabolic models revealed biochemical reactions with different metabolic fluxes, which were all associated with NADPH generation. Taken together, the genomic and computational approaches undertaken in this study suggest biological clues for the enhanced rapamycin production of the SRMK07 strain. These clues can also serve as a basis for systematic engineering of a production host for further enhanced rapamycin production.


Results
Rapamycin production and growth of the SRMK07 strain. The rapamycin-overproducing mutant, S. rapamycinicus SRMK07, produced approximately 207 mg/L rapamycin, or around fourfold more rapamycin than the NRRL 5491 strain (Fig. 1a). Despite the significant improvement in the rapamycin production, the SRMK07 strain showed almost normal growth in comparison to the wild-type (Fig. 1b). The wild-type and the SRMK07 strain reached the peak of biomass accumulation on the fourth and sixth day, according to packed mycelium volume (PMV), respectively. Additionally, both strains were also grown on ISP2 and M1 plates to examine the morphology of their colonies and sporulation patterns, respectively ( Fig. 1c and "Methods" section). Both strains grew well on ISP2 plates, but the wild-type colonies were greater in size; both strains did not www.nature.com/scientificreports/ sporulate on ISP2 plates. Meanwhile, on M1 plates, the wild-type formed spores, while the SRMK07 strain did not show any indication of sporulation.
Whole genome sequencing of S. rapamycinicus NRRL 5491 and the SRMK07 strain. We next conducted whole genome sequencing (WGS) of the NRRL 5491 and SRMK07 strains, using both PacBio and Illumina platforms, to identify genomic changes in the SRMK07 strain that have led to its high production performance of rapamycin. The resulting whole genome sequences of the wild-type and the SRMK07 appeared to be 12.47 Mbp and 9.56 Mbp, respectively (Fig. 2). In contrast to the SRMK07 strain's genome that was initially obtained as a single contig, the wild-type's genome data were obtained as seven contigs. To resolve this problem, two independent sequences of S. rapamycinicus NRRL 5491 genome, currently available in the NCBI database (GCA_003675955.1 25 and GCA_000418455.1 26 ), were used as references to connect the seven contigs of our wild-type's genome. Among these two sequences, only GCA_000418455.1 is represented as a single contig although it contains multiple sequencing gaps. In contrast, the second sequence (i.e., GCA_003675955.1) contains four contigs, but fortunately, lacks sequencing gaps. Therefore, we utilized GCA_000418455.1 as an initial framework to determine the correct order of our own NRRL 5491 seven contigs, while GCA_003675955.1 served as a template to fill any sequencing gaps. The assembled wild-type genome sequence showed a size (12.47 Mbp) comparable to that of GCA_000418455.1 (12.70 Mbp).
Comparative genomic analysis based on Illumina reads mapping. Comparison of the genomes of the wild-type and the SRMK07 strain showed a difference of about 2.91 Mbp; 10,140 protein-coding genes were predicted from the wild-type genome, while only 7757 protein-coding genes were predicted from the SRMK07 genome ( Fig. 2), which indicates large genomic deletions in the SRMK07 genome as a result of the random mutagenesis. To further analyze the genomic differences between the wild-type and the SRMK07 strain, Illumina sequencing reads of the SRMK07 strain were mapped on the assembled wild-type genome (Fig. 2) T1 T2 T3 T4 T5 T6 T7 T8   TIGR01235  TIGR03442  TIGR03438 TIGR00173 9783802 Figure 2. Profiles of Illumina reads from S. rapamycinicus NRRL 5491 (wild-type) and the SRMK07 strain, mapped on the wild-type's genome assembled in this study. (a) Profile of Illumina reads from the wild-type. (b) Profile of Illumina reads from the SRMK07 strain. The data were visualized using SignalMap (Roche NimbleGen, Inc., Pleasanton, CA). ' A' and ' A'' indicate the potentially deleted regions, and 'B' indicates the potentially duplicated region in the SRMK07 strain's genome. Information with dashed lines correspond to the location of target genes for the relative quantification analysis using qPCR (Fig. 3) as well as the deleted core genes ( www.nature.com/scientificreports/ experiments, primers were designed to target genes located in either end region of the wild-type's genome, which were expected to be absent in the SRMK07 strain's genome. Indeed, as a result of the PCR, target bands with expected size were obtained only from the wild-type's genomic DNA (gDNA), and not from the SRMK07 strain's genome ( Supplementary Fig. S1). However, further thorough analysis will be necessary to confirm these potential genomic deletions in the SRMK07 strain because Streptomyces species have a linear chromosome with both ends having terminal inverted repeats (TIRs), and these TIRs make firm mapping of the borders of the deletions difficult. Full information on conflict positions in nucleotide sequences as well as missing genes in the SRMK07 strain in comparison with the wild-type is available in Supplementary Data 1. We subsequently examined whether secondary metabolite BGCs in the SRMK07 strain were affected by the genomic deletions by running antiSMASH 5.0 for genomes of the wild-type and the SRMK07 strain 27 . As a result, the wild-type was predicted to have 52 BGCs, whereas 17 BGCs appeared to be lost in the SRMK07 strain (Table 1). Since nine of these missing BGCs encode polyketides or hybrids of non-ribosomal peptides and polyketides, the loss of these BGCs in the SRMK07 strain might have enhanced the production of rapamycin by redirecting precursors necessary for the rapamycin biosynthesis.
Interestingly, a genomic region (9,783,802-10,695,700 bp in the wild-type genome) was observed in the SRMK07 strain where the number of the mapped reads was notably greater than other regions of the genome (1,174,783,802 bp in the wild-type genome) by approximately twofold (the region 'B' in Fig. 2b); this genomic region strongly indicates the duplication of genes. Since this genomic region in the SRMK07 strain covers the rapamycin BGC (8,583,830,075 bp in the SRMK07 genome, which corresponds to 9,758,976-10,004,25 bp in the wild-type genome), duplication of genes in this region might have also contributed to the enhanced rapamycin production. To further verify the duplication of this genomic region, we performed real-time PCR (qPCR) for relative quantification of genes from the potentially duplicated region in comparison with genes from other regions that are known to exist as a single copy across various Streptomyces species (Fig. 3, Supplementary Table S2); information on single-copy genes was obtained from OrthoDB 28 . For this analysis, five reference single-copy genes were selected, which encode: NADH-quinone oxidoreductase subunit H; RtcB family protein; RNA helicase; aspartate kinase; and type I DNA topoisomerase. Likewise, eight genes were selected from the potentially duplicated region in the SRMK07 strain's genome, which encode: 3-ketoacyl-CoA thiolase; regulatory protein AfsR; l-lysine cyclodeaminase; ferredoxin; glycerol uptake operon antiterminator regulatory protein; a hypothetical protein encoded by a rapamycin biosynthetic gene; SDR family oxidoreductase; and putative ABC transporter ATP-binding protein YbiT. l-lysine cyclodeaminase (encoded by rapL), ferredoxin (encoded by rapO gene), and the hypothetical protein all belong to the rapamycin BGC. The qPCR results with the gene encoding NADH-quinone oxidoreductase subunit H as a control showed that the relative quantities of the reference single-copy genes in the SRMK07 genome ranged from 0.7 to 1.3, while the relative quantities of the genes from the potentially duplicated region were close to 2 (Fig. 3). These results strongly suggest the duplication of the genomic region in the SRMK07 strain where the rapamycin BGC is located.
Core gene analysis. The SRMK07 strain showed the normal growth despite the large genomic deletions.
This observation raised a question on the presence of core genes in this strain that are necessary for the normal growth; the core genes here refer to those present in genomes of the vast majority of biologically related organisms, for example, Streptomyces species in this study, likely because of the biological importance 29,30 . To examine Table 1. Biosynthetic gene clusters (BGCs) that appeared to be absent in the SRMK07 strain's genome. a BGCs were detected using antiSMASH version 5.0 27 . b Metabolites in red are polyketides or hybrids of non-ribosomal peptide and polyketide. c Non-ribosomal peptide synthetase. www.nature.com/scientificreports/ the distribution of core genes in the SRMK07 strain's genome, a software program ' Antibiotic Resistant Target Seeker' (ARTS) was used, which allows the detection of core genes, including housekeeping genes and resistance genes associated with BGCs 31 . As a result, ARTS predicted 393 and 389 core genes (out of 10,140 and 7757 protein-coding genes, respectively) from genomes of the wild-type and the SRMK07 strain, respectively ( Table 2, Supplementary Data 2). Hence, only four core genes were predicted to be missing in the SRMK07 strain. These four genes include TIGR01235, TIGR03442, TIGR03438, and TIGR00173 (all TIGRFAM identifiers 32 ), which encode: pyruvate carboxylase; ergothioneine biosynthesis protein EgtC (or γ-glutamyl-hercynylcysteine sulfoxide encoded); ergothioneine-biosynthetic methyltransferase EgtD (or histidine N-alpha-methyltransferase); and 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase MenD, respectively. It should be noted that evidence for the possible presence of paralogs of these four genes was not found in the SRMK07 genome according to ARTS and OrthoDB. A close examination of metabolic genes in the SRMK07 strain suggested that the loss of these four core genes should not affect the overall metabolic activities of the SRMK07 strain. Pyruvate carboxylase (TIGR01235) is an anaplerotic enzyme, and is involved in regulating a phosphoenolpyruvate (PEP)-pyruvate-oxaloacetate pool that is critical for the optimal distribution of metabolic fluxes in central carbon metabolism 33 . Despite the loss of this gene, other genes involved in regulating the PEP-pyruvate-oxaloacetate pool were still present in the SRMK07, including PEP carboxykinase, PEP carboxylase, malic enzyme, and malate dehydrogenase. Next, EgtC (TIGR03442) and EgtD (TIGR03438) are involved in the biosynthesis of ergothioneine that detoxifies reactive oxygen species and reactive nitrogen species for redox homeostasis, and the absence of ergothioneine results in higher oxidative stress 34,35 . Biological roles of ergothioneine in Gram-positive bacteria can be complemented by other detoxifying molecules, such as mycothiol, and glutathione 34,35 . Our genomic analysis of the wild-type and the SRMK07 showed that both strains carry intact genes for the biosynthesis of mycothiol 36 (i.e., mshA, mshB, mshC, and mshD) and glutathione. Finally, menD (TIGR00173) is part of the genes, menABCDEFGH, that encode the biosynthesis of menaquinone; menaquinone plays an important role in the electron transport in Gram-positive bacteria 37 . Fortunately, an alternative biosynthetic pathway for menaquinone, known as futalosine pathway, has also been reported in Streptomyces coelicolor [38][39][40] . Homologs of the genes in this futalosine pathway were found to be present in both the wild-type and the SRMK07 strain (Supplementary Table S3).
Comparative metabolic network analysis. Enhanced production of a secondary metabolite might also be linked with changes in a metabolic network, providing precursors and energy molecules necessary for rapamycin biosynthesis. To examine this question, we reconstructed GEMs, SrapWT2040 and SrapUV2010, that describe the metabolism of the NRRL 5491 and SRMK07 strains, respectively (Supplementary Data 3,4). In contrast to 2383 protein-coding genes that appeared to be deleted in the SMRK07 strain as a result of the random mutagenesis, the constructed SrapUV2010 appeared to have only 30 fewer biochemical reactions than SrapWT2040; these 30 biochemical reactions are associated with a total of 372 metabolic genes (Fig. 4a, Table 3). As expected, biochemical reactions associated with a pathway 'Biosynthesis of secondary metabolites' were shown to be most affected in SrapUV2010; six corresponding biochemical reactions, out of the 30 reactions, were missing in SrapUV2010. Additional metabolic pathways that were affected by the random mutagenesis include: central carbon metabolism (i.e., fructose, mannose, glyoxylate, and TCA cycle), amino acid metabolism (i.e., phenylalanine, arginine, proline, and glycine), and degradation pathways (i.e., benzoate, styrene, and polycyclic aromatic hydrocarbon).
To gain insights into the metabolic effects of losing these 30 reactions, we first conducted gene/reaction essentiality analysis for the two GEMs, SrapWT2040 and SrapUV2010 (Fig. 4b). We found that none of these 30 reactions were essential for the growth of the wild-type and the SMRK07 strain. Overall, SrapUV2010 showed a slightly greater number of essential genes and essential reactions than SrapWT2040: 189 versus 181 essential genes, and 513 versus 512 essential reactions (Fig. 4b). A possible reason for this observation is likely attributed to less metabolic robustness of SrapUV2010 as a result of losing the 30 reactions. For example, NAD kinase is encoded by two paralogous nadK genes in the wild-type, but, only one nadK gene remains in the SRMK07 Table 2. Deleted core genes in the SRMK07 strain's genome. a Identifiers (IDs) of the detected core genes were obtained from ARTS, which are mostly TIGRFAM IDs 32 . b 'Duplication' indicates the presence of a gene with a greater copy number than the average copy number of this gene present in other organisms. www.nature.com/scientificreports/ strain. Therefore, this single nadK gene becomes essential in SrapUV2010 upon gene deletion in silico. Also, one additional essential reaction in SrapUV2010 corresponds to the reaction LIPOCT catalyzed by lipoyl(octanoyl) transferase. Likewise, LIPOCT is a non-essential reaction in the wild-type, but became essential as a result of deleting the reaction OCTNLL that is catalyzed by octanoate non-lipoylated apo domain ligase in 'lipoate metabolism' . Both OCTNLL and LIPOCT contribute to the biosynthesis of lipoate, which is an essential cofactor for 2-oxoacid dehydrogenases and glycine cleavage system in central carbon metabolism 41 . Next, parsimonious flux balance analysis (pFBA) was implemented for SrapWT2040 and SrapUV2010 to gain insights into their intracellular flux distributions when producing rapamycin. The pFBA simulation revealed two reactions with greater flux values in SrapUV2010, ME2 (catalyzed by NADP-dependent malic enzyme) and G3PD2 (NADP-dependent glycerol-3-phosphate dehydrogenase) (Fig. 5). ME2 and G3PD2 both produce NADPH, which is a required cofactor for the biosynthesis of various secondary metabolites 42,43 ; greater activities of these two corresponding enzymes could be another factor for the SRMK07 strain's enhanced rapamycin production performance. According to a previous study, overexpressing sco5261 for the ME2 reaction increased the production of a secondary metabolite actinorhodin in S. coelicolor 44 . Genes for these two reactions have never been targeted for the enhanced production of rapamycin, and thus, can be considered as overexpression targets for metabolic engineering of S. rapamycinicus in the future.

Discussion
In this study, we conducted a comparative genomic analysis for S. rapamycinicus NRRL 5491 and its mutant strain SRMK07 that overproduces rapamycin. For this, both strains were subjected to WGS, which subsequently allowed the identification of large deletions at both end regions of the SRMK07 genome as well as the potentially duplicated region that covers the rapamycin BGC. The duplication of the rapamycin BGC as well as the deletion of the extremities of the chromosome that includes many BGCs are likely to have positive effects on the rapamycin biosynthesis. Obviously, duplicated rapamycin BGC would increase the dosage of rapamycin biosynthetic and regulatory genes, contributing to the enhanced biosynthesis of this molecule. Also, in the absence of multiple BGCs, precursors and energy used for the biosynthesis of the corresponding molecules may be redirected toward the rapamycin biosynthesis, further improving the rapamycin production. Core gene analysis was additionally conducted using ARTS to explain the SRMK07 strain's normal growth despite the large genomic deletions. Finally, GEMs of the wild-type and the SRMK07 strain were reconstructed to examine these two strains' metabolic differences.
This study suggests future research opportunities in metabolic engineering for the enhanced production of rapamycin and other secondary metabolites. Previous studies have shown the benefits of genome reduction, which reduces biological complexity, increases genome stability, and improves the production of  Table S4) and 19 individual nitrogen sources (Supplementary Table S5). It should be noted that SrapUV2010 also generated the same prediction accuracy as SrapWT2040. www.nature.com/scientificreports/ www.nature.com/scientificreports/ secondary metabolites 45 . Relevant examples include Streptomyces avermitilis SUKA17 producing streptomycin and cephamycin 46 , S. coelicolor M1152 and M1154, both strains producing actinorhodin and chloramphenicol 47 , and Streptomyces sp. FR-008 LQ3 as a chassis for heterologous expression of biosynthetic genes for secondary metabolites 48 . Therefore, the SRMK07 strain can also be considered as a promising platform to construct a novel superhost for further enhanced production of rapamycin and other secondary metabolites 49,50 . Also, in addition to the two reactions ME2 (catalyzed by NADP-dependent malic enzyme) and G3PD2 (NADP-dependent glycerol-3-phosphate dehydrogenase) that can be considered as overexpression targets, further gene manipulation targets can be systematically predicted via comprehensive simulation studies using the GEMs reconstructed. Finally, additional omics analyses, for example transcriptome and metabolome, in combination with genomic and metabolic network analyses would provide more comprehensive phenotypic profiles of the wild-type and the SRMK07 strain.

Conclusions
Comparative genomic analysis conducted in this study generated biological clues that could explain the enhanced rapamycin production performance of the S. rapamycinicus SRMK07 strain that was previously generated via random mutagenesis. The genomic and computational approaches undertaken in this study suggest gene manipulation targets to further enhance the production of rapamycin that can be experimentally tested through metabolic engineering. The approaches in this study can also be considered for analyzing other mutant strains generated from random mutagenesis.

Methods
Strains. S. rapamycinicus NRRL 5491, the wild-type, and its rapamycin overproducing mutant SRMK07 were used in this study. The SRMK07 strain was previously generated via UV-based random mutagenesis 24 . The wild-type spores were resuspended in saline buffer (0.85% NaCl and 0.1% Tween 90) to dilute the concentration to about 10 8 /mL. Next, 100 µL of the diluted spores were spread evenly on a M1 plate (2.5 g/L corn steep powder, 3 g/L yeast extract, 3 g/L CaCO 3 , 0.3 g/L FeSO 4 , 10 g/L wheat starch, and 20 g/L agar), and exposed to UV for 60 s. UV conditions were set to be 254 nm wavelength, 40 W, and 25-30 cm distance from the spore www.nature.com/scientificreports/ suspension to achieve a 99% killing rate. The UV-treated spores were incubated at 28 °C for 7-10 days using agar plates containing 2 g/L rapamycin, which allowed the screening of rapamycin-resistant strains. The SRMK07 strain used in this study was obtained by measuring rapamycin from the resistant strains through liquid culture in a 250 mL flask.
Cultivation conditions. Seed cultures of the two strains were incubated in GYM medium for 3 days, and transferred to a main cultivation medium. Flask cultivations were carried out for 14 days at 28 °C and 250 rpm (Fig. 1a,b). GYM medium contains: 4 g/L glucose, 4 g/L yeast extract, and 10 g/L malt extract. The main medium used in flask cultivations was adopted from Yun et al. 7 with a slight modification. The main medium contains: 10 g/L M100, 50 g/L glycerol, 10 g/L cottonseed meal, 10 g/L soybean meal, 6.5 g/L yeast extract, 5 g/L (NH 4 ) 2 SO 4 , 20 g/L L-lysine, 4 g/L L-tyrosine, 0.7 g/L KH 2 PO 4 , 1.14 g/L K 2 HPO 4 , 5 g/L NaCl, 0.05 g/L FeSO 4 ·7H 2 O, and 42.6 g/L MES. The two strains were also cultured on solid media in order to find differences in their growth phenotypes with focus on morphology and sporulation. ISP2 plate (4 g/L glucose, 4 g/L yeast extract, 10 g/L malt extract, and 20 g/L agar) was used to compare general growth characteristics of the two strains. M1 plate was used to examine sporulation of the two strains. The two strains were grown on the solid media for 7 days.

Measurement of rapamycin concentration and packed mycelium volume (PMV).
During the flask cultivations, culture broth was sampled at 500 μL, and used for the measurement of rapamycin concentration. For this, culture broth was mixed with methanol in a 1:1 ratio, and vortexed for 30 min. The mixed solutions were subsequently centrifuged, and the supernatants were collected for the analysis using Waters 2695 (Waters, Milford, MA) high-performance liquid chromatography (HPLC) equipped with Agilent Eclipse XDB-C18 column (Agilent Technologies, Santa Clara, CA) and Waters 2487 detector (Waters, Milford, MA). In this HPLC analysis, water and acetonitrile were used as a mobile phase with ratio varied from 80:20 (v/v) to 20:80 (v/v) at a 1 mL/min flow rate, and 277 nm wavelength was used for the detector. Detected peaks were compared with a peak of the standard rapamycin compound (Sigma-Aldrich, St. Louis, MO) to measure the rapamycin concentration.
Packed mycelium volume (PMV) was used to estimate the growth of the NRRL 5491 and SRMK07 strains (Fig. 1b) because it was difficult to measure optical density or dry cell weight (DCW) from the insoluble media used for the rapamycin production 51,52 . For this, the collected culture broth (each 5 mL) was centrifuged at 3000 × g for 20 min. PMV was expressed as a percentage (%) by dividing PMV by the sample volume (5 mL).
Whole genome sequencing. For WGS, the wild-type and SRMK07 strain were incubated in tryptic soy broth (TSB) medium (17 g/L tryptone, 3 g/L soytone, 2.5 g/L glucose, 5 g/L NaCl, and 2.5 g/L K 2 HPO 4 ) for 3-4 days, and gDNA samples of the two strains were extracted using Wizard Genomic DNA Purification Kit (Promega, Madison, WI). Next, WGS and genome annotation of the two strains were conducted at DNA Link, Inc. (Seoul, Korea) by using the PacBio (Pacific Biosciences, Menlo Park, CA) and Illumina (Illumina Inc., San Diego, CA) platforms together. To increase the genome sequence quality, the genome correction method suggested by Lee et al. 53 was used; if more than 80% of Illumina reads for a specific genomic site conflict with the PacBio results, these sequences were substituted according to the Illumina results using CLC Genomics Workbench version 6.5.1 (CLC bio, Aarhus, Denmark).
Real-time PCR (qPCR) for verifying the potentially duplicated region in the SRMK07 strain's genome. To verify the potentially duplicated region in the SRMK07 strain's genome, five genes known to exist as a single copy in more than 100 Streptomyces species were selected based on OrthoDB 28 (https:// www. ortho db. org), and eight genes that represent the potentially duplicated region were selected based on our genome annotation results (Fig. 3). The qPCR experiments were conducted in accordance with the manufacturer's protocol using gDNA of the two strains and SYBR Green PCR Master Mix (Thermo Fisher Scientific, Waltham, MA).

Analysis of biosynthetic gene clusters (BGCs) and core genes.
BGCs and core genes of the wildtype and SRMK07 strain were analyzed using antiSMASH version 5.0 27 (http:// antis mash. secon darym etabo lites. org) and ARTS (Antibiotic Resistant Target Seeker) 31 version 2 (https:// arts. zieme rtlab. com), respectively. antiSMASH was implemented using the default options and 'loose' detection strictness. ARTS was also implemented with the default options with ' Actinobacteria' for 'Reference set' .
Generation of draft genome-scale metabolic models (GEMs). The draft GEMs that represent the metabolism of the NRRL 5491 and SRMK07 strains were generated using a Python-based GEM reconstruction tool that was previously released as a feature of antiSMASH 3.0 54 . The Python-based GEM reconstruction tool requires protein sequences and their corresponding enzyme commission (EC) numbers for a target organism as well as a high-quality GEM of a biologically close organism in order to build a draft GEM. In this study, EC numbers for protein sequences from the wild-type and SRMK07 strain were predicted using DeepEC, a deep learning-based EC number prediction tool 55 . A high-quality GEM of S. coelicolor, iKS1317 56 , was used as a template GEM. The resulting draft GEMs were generated in Systems Biology Markup Language (SBML), and further model refinement and simulations were implemented using COBRApy 57 . www.nature.com/scientificreports/ individual nitrogen sources 58 (Fig. 4c, Supplementary Tables S4, S5). Simulation of the draft GEM showed reasonably high accuracy, 78%, in comparison with the experimental growth data. Next, information on the rapamycin biosynthetic pathway, involving 14 condensation steps, a ring closure step, and post-polyketide synthase modification steps 22 , was added to the draft GEM. This rapamycin biosynthetic pathway was expressed as a single stoichiometric equation by implementing Biosynthetic Gene cluster Metabolic pathway Construction (BiG-MeC), a pipeline that helps to create a metabolic pathway for a biosynthetic gene cluster encoding polyketides and non-ribosomal peptides 59 . Finally, MEMOTE (i.e., metabolic model tests) 60 was implemented to evaluate the quality of the draft GEM. The same procedure was undertaken for the GEM representing the SRMK07 strain.

Refinement of the draft
Prediction of intracellular metabolic flux distributions using the GEMs. Parsimonious flux balance analysis (pFBA) was implemented to predict intracellular metabolic flux distributions because of its robust predictive power 61 . For pFBA of SrapWT2040, rapamycin production rate of 5 × 10 -4 mmol/g DCW/h and glycerol uptake rate of 0.8 mmol/g DCW/h were provided as constraints based on a previous study 62 . For SrapUV2010, rapamycin production rate and glycerol uptake rate were set to 2.0 × 10 -3 mmol/g DCW/h and 1.0 mmol/g DCW/h, respectively; rapamycin production rate and glycerol uptake rate were adopted from our cultivation experiments (Fig. 1) and Wang et al. 63 .

Data availability
All the data generated or analyzed during this study, including genome-scale metabolic models (GEMs), are available as Supplementary Information files. Genome sequences of S. rapamycinicus NRRL 5491 and its mutant SMRK07 have been deposited in NCBI GenBank under accession numbers of CP085193 and CP085309.